Substitution of state distributions to reproduce natural prosody on HMM-based speech synthesizers
نویسندگان
چکیده
An extension of HMM-based speech synthesis is proposed to reproduce natural speech sounds. For compression of large amounts of speech, the use of speech synthesizers has an advantage in terms of the size of compressed data. However, the quality of synthetic speech is often inferior to that of speech compressed by general-purpose speech codecs such as CELP, where prosodic features are reproduced more accurately. Therefore, we propose adding complementary information to reproduce natural prosody. In the proposed method, inappropriate state feature vectors of HMMs determined by the conventional speech synthesis method are substituted by other vectors bound to the decision trees. The experimental results indicated that substitution of 20% of state feature vectors reduces root mean squared error (RMSE) in log F0 to 0.3 semitones, which is approximately 15% of RMSE without substitution.
منابع مشابه
Speech enhancement based on hidden Markov model using sparse code shrinkage
This paper presents a new hidden Markov model-based (HMM-based) speech enhancement framework based on the independent component analysis (ICA). We propose analytical procedures for training clean speech and noise models by the Baum re-estimation algorithm and present a Maximum a posterior (MAP) estimator based on Laplace-Gaussian (for clean speech and noise respectively) combination in the HMM ...
متن کاملEvaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers
This paper explores the relationship between intelligibility and comprehensibility in speech synthesizers, and it designs an appropriate comprehension task for evaluating the speech synthesizers’ comprehensibility. Previous studies have predicted that a speech synthesizer with higher intelligibility will have higher performance in comprehension. Also, since the two most popular speech synthesis...
متن کاملAn excitation model for HMM-based speech synthesis based on residual modeling
This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based speech synthesizers. During the waveform generation part, mixed excitation is constructed by state-dependent filtering of pulse trains and white noise sequences. In the training part, filters and pulse trains are jointly optimized through a procedure which resembles analysis-bysynthesis speech codin...
متن کاملImplementation and evaluation of an HMM-based Thai speech synthesis system
This paper describes a novel approach to the realization of Thai speech synthesis. Spectrum, pitch, and phone duration are modeled simultaneously in a unified framework of HMM, and their parameter distributions are clustered independently by using a decision-tree based context clustering technique with different styles. A group of contextual factors which affect spectrum, pitch, and state durat...
متن کاملPerformance Analysis of Text To Speech Synthesis System Using HMM And Prosody Features With Parsing For Tamil Language
This paper describes a Hidden Markov Model (HMM) based (TTS) system and prosody based (TTS) system for producing natural sounding synthetic speech in Tamil language. The (HMM) based system consists of two phases such as training and synthesis. Tamil speech is first parameterized into spectral and excitation features using Glottal Inverse Filtering (GIF). An emotions present in the input text is...
متن کامل